首页> 外文OA文献 >Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system
【2h】

Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system

机译:将英语歧视为其他印度语言(卡纳达语和印地语)   用于OCR系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

India is a multilingual multi-script country. In every state of India thereare two languages one is state local language and the other is English. Forexample in Andhra Pradesh, a state in India, the document may contain textwords in English and Telugu script. For Optical Character Recognition (OCR) ofsuch a bilingual document, it is necessary to identify the script beforefeeding the text words to the OCRs of individual scripts. In this paper, we areintroducing a simple and efficient technique of script identification forKannada, English and Hindi text words of a printed document. The proposedapproach is based on the horizontal and vertical projection profile for thediscrimination of the three scripts. The feature extraction is done based onthe horizontal projection profile of each text words. We analysed 700 differentwords of Kannada, English and Hindi in order to extract the discriminationfeatures and for the development of knowledge base. We use the horizontalprojection profile of each text word and based on the horizontal projectionprofile we extract the appropriate features. The proposed system is tested on100 different document images containing more than 1000 text words of eachscript and a classification rate of 98.25%, 99.25% and 98.87% is achieved forKannada, English and Hindi respectively.
机译:印度是一个多语言,多语言的国家。在印度的每个州,都有两种语言,一种是州本地语言,另一种是英语。例如,在印度的一个州安得拉邦(Andhra Pradesh),该文档可能包含英语文字和泰卢固语文字。对于这样的双语文档的光学字符识别(OCR),有必要在将文本单词馈送到各个脚本的OCR之前识别脚本。在本文中,我们将介绍一种简单有效的脚本识别技术,用于打印文档的卡纳达语,英语和印地语文本单词。所提出的方法基于对三个脚本的区分的水平和垂直投影轮廓。基于每个文本单词的水平投影轮廓来完成特征提取。我们分析了卡纳达语,英语和印地语的700个不同单词,以提取歧视特征并发展知识库。我们使用每个文本单词的水平投影轮廓,并基于水平投影轮廓提取适当的特征。所提出的系统在100个不同文档图像上进行了测试,每个图像包含1000多个文字,每个单词的卡纳达语,英语和北印度语的分类率分别为98.25%,99.25%和98.87%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号